Investigating the Working of Text Classifiers

نویسندگان

  • Devendra Singh Sachan
  • Manzil Zaheer
  • Ruslan Salakhutdinov
چکیده

Text classification is one of the most widely studied task in natural language processing. Recently, larger and larger multilayer neural network models are employed for the task motivated by the principle of compositionality. Almost all of the methods reported use discriminative approaches for the task. Discriminative approaches come with a caveat that if there is no proper capacity control, it might latch on to any signal even though it might not generalize. With use of various state-of-the-art approaches for text classifiers, we want to explore if the models actually learn to compose meaning of the sentences or still just use some key lexicons. To test our hypothesis, we construct datasets where the train and test split have no direct overlap of such lexicons. We study various text classifiers and observe that there is a big performance drop on these datasets. Finally, we show that even simple regularization techniques can improve performance on these datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

L2 Learners’ Strategy Preference in Metaphorical Test Performance: Effects of Working Memory and Cognitive Style

Although investigating the factors that influence test scores is important, a majority of stakeholders show a paucity of attention towards individual learner differences due to having large classes of L2 learners. This study sought to explore the possible effect of working memory and cognitive style on L2 learners’ metaphorical test performance. The study was conducted in 2 phases. The first ph...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

Investigating Multi-Label Classification for Human Values

This paper describes the development of a scalable process for people and machines working together to identify sections of text that reflect specific human values. A total of 2,005 sentences from 28 prepared testimonies presented before hearings on Net neutrality were manually annotated for one or more of ten human values using an annotation frame based on experience annotating similar content...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.06261  شماره 

صفحات  -

تاریخ انتشار 2018